Automatic extraction of cue phrases for important sentences in lecture speech and automatic lecture speech summarization
نویسندگان
چکیده
We automatically extract the summaries of spoken class lectures. This paper presents a novel method for sentence extraction-based automatic speech summarization. We propose a technique that extracts “cue phrases for important sentences (CPs)” that often appear in important sentences. We formulate CP extraction as a labeling problem of word sequences and use Conditional Random Fields (CRF) [1] for labeling. Automatic summarization using CP extraction results as features yields precisions of 0.603 and 0.556 when using manual transcriptions and Automatic Speech Recognition (ASR) results, respectively. Combining the features derived from the CPs and traditional features (including repeated words, words repeated in a slide text, and term frequency (tf), which are surface linguistic information, and speech power and duration, which are prosodic features) [2, 3], we obtained better summarization performance with a κ-value of 0.380, a F -measure of 0.539, and a Rouge-4 of 0.709.
منابع مشابه
مقایسه روشهای مختلف یادگیری ماشین در خلاصهسازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت
In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...
متن کاملبهبود خلاصه سازی خودکار متون فارسی با استفاده از روشهای پردازش زبان طبیعی و گراف شباهت
A significant amount of available information is stored in textual databases which contains a large collection of documents from different sources (such as news, articles, books, emails and web pages). The increasing visibility and importance of this class of information motivates us to work on having better automatic evaluation tools for textual resources. The automatic summarization of tex...
متن کاملIntonational phrases for speech summarization
Extractive speech summarization approaches select relevant segments of spoken documents and concatenate them to generate a summary. The extraction unit chosen, whether a sentence, syntactic constituent, or other segment, has a significant impact on the overall quality and fluency of the summary. Even though sentences tend to be the choice of most the extractive speech summarizers, in this paper...
متن کاملSpontaneous speech consolidation for spoken language applications
This paper describes the work done as a part of the International Workshop on Speech Summarization for Information Extraction and Machine Translation (IWSpS) , on spoken language processing including summarization, machine translation and question answering on lecture speech in the Translanguage English Database (TED) corpus . The hypotheses of lecture speech obtained by automatic speech recogn...
متن کاملA Study on Statistical Methods for Automatic Speech Summarization
This dissertation proposes a new automatic speech summarization method through word extraction. In this method, a set of words maximizing a summarization score indicating an appropriateness of summarization is extracted from automatically transcribed speech. This extraction is performed according to a target compression ratio using a dynamic programming technique sentence by sentence. The extra...
متن کامل